Preprocessing

Since the data is so large, we prepare some filters and computations in advance and load the processed data.

The filters applied to the data are:

Within the crime data, we filter out major property crimes (robberies, burglaries, larceny, or theft) and violent crimes (assaults, homicides, shootings).

The thefts and violent crimes data is then aggregated spatially in two ways:

  1. Count the number of crimes within 150 meters of each parcel
  2. Compute kernel density estimation of crimes within a bandwidth of each parcel, where bandwidth is selected via cross-validation
# # Aggregate crime data spatially in two ways:
# # 1. Count number of crimes within 150 meters of each parcel
# # 2. Compute kernel density estimation of crimes within 150 meters of each 
# #    parcel
# # Filter major property crimes and violent crimes
# stealin <- z[grep("ROBBERY|BURGLARY|LARCENY|THEFT",
#                    z$UCR_1_Category), ]
# hurtin <- z[grep("ASSAULT|HOMICIDE|SHOOTING", z$UCR_1_Category), ]
# # Get number of thefts and violent crimes within 150 meters of each parcel
# p$Thefts <- st_is_within_distance(st_transform(p, crs = 26956),
#                                   st_transform(stealin, crs = 26956),
#                                   dist = units::set_units(150, "m")) |>
#   sapply(length)
# p$Violence <- st_is_within_distance(st_transform(p, crs = 26956),
#                                     st_transform(hurtin, crs = 26956),
#                                     dist = units::set_units(150, "m")) |>
#   sapply(length)
# # Compute KDE of thefts and violent crimes
# # Bandwidth selection using cross-validation (Hscv)
# p_proj <- p |> st_transform(crs = 26956) |> st_centroid() |> st_coordinates()
# v_proj <- hurtin |> st_transform(crs = 26956) |> st_coordinates()
# p$Violence_KDE <- kde(v_proj, Hscv(v_proj)) |> predict(x = p_proj)
# t_proj <- stealin |> st_transform(crs = 26956) |> st_coordinates()
# p$Thefts_KDE <- kde(t_proj, Hscv(t_proj)) |> predict(x = p_proj)
# 
# # Save parcel data with computed crime statistics
# st_write(p, ".//data//parcel_hartford_single_family_crimestats.geojson")
# Read in filtered Hartford crime and parcel data
z <- st_read(".//data//crime_hartford_2016_2021.geojson")
## Reading layer `crime_hartford_2016_2021' from data source 
##   `C:\Users\llint\OneDrive - Yale University\classes\625\CT Property\SDS625\data\crime_hartford_2016_2021.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 197060 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -72.71865 ymin: 41.72403 xmax: -72.65041 ymax: 41.80719
## Geodetic CRS:  WGS 84
p <- st_read(".//data//parcel_hartford_single_family_crimestats.geojson")
## Reading layer `parcel_hartford_single_family_crimestats' from data source 
##   `C:\Users\llint\OneDrive - Yale University\classes\625\CT Property\SDS625\data\parcel_hartford_single_family_crimestats.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 7239 features and 51 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -72.71637 ymin: 41.72374 xmax: -72.65809 ymax: 41.80743
## Geodetic CRS:  WGS 84
# Get polygons for Hartford's census tracts
hartford_tracts <- st_filter(tracts(state = "CT"),
                             subset(county_subdivisions(state = "CT"), 
                                    NAMELSAD == "Hartford town"),
                             .predicate = st_within) |>
   st_transform(crs = st_crs(z))
## Retrieving data for the year 2021
## Retrieving data for the year 2021
# Re-order condition description of parcels
p$Condition_Description <- factor(
  as.character(p$Condition_Description), 
  levels = c("Dilapidated", "Very Poor", "Poor", "Fair", "Fair-Avg", "Average", 
             "Avg-Good", "Good", "Good-VG", "Very Good", "Excellent"))

# Select predictors and filter out NA's
X <- p[, c("OBJECTID", "Assessed_Total", 
           "Thefts", "Violence", "Thefts_KDE", "Violence_KDE", 
           "Living_Area", "Effective_Area", "AYB", 
           "Number_of_Bedroom", "Number_of_Baths", 
           "Condition_Description")] %>%
  rename(Price = Assessed_Total, Year = AYB, 
         Bed = Number_of_Bedroom, Bath = Number_of_Baths,
         Condition = Condition_Description) %>%
  na.omit()
# Check that we haven't dropped too many rows
nrow(p) # 7239 rows
## [1] 7239
nrow(X) # 7220 rows
## [1] 7220

Visualization

The manually curated variables are:

The highly correlated numeric covariates are

Overall, there does not seem to be a strong correlation between A couple numeric covariates appear to be correlated with the condition of the parcel:

All covariates appear to correlate with the response variable, Assessed_Total.

# Plot pairwise correlation between numeric predictors
ggpairs(X, columns = c(2:10), lower = list(continuous = "points"), 
        progress = F) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Plot boxplots of numeric predictors with respect to condition description
# Pivot data longer so that we can facet by variable
LX <- X %>%
  pivot_longer(cols = c(Thefts, Violence, Thefts_KDE, Violence_KDE,
                        Living_Area, Effective_Area, 
                        Year, Bed, Bath, Price), 
               names_to = "variable", values_to = "value")
ggplot(data = LX, aes(x = variable, y = value)) + 
  geom_boxplot(aes(fill = Condition), position = position_dodge(1)) + 
  facet_wrap( ~ variable, scales = "free", ) + 
  theme(axis.title.x = element_blank(), axis.text.x = element_blank(),
        axis.ticks.x = element_blank(), panel.grid.major.x = element_blank(),
        axis.title.y = element_blank())

Analysis

1. Crime counts

We first consider the Thefts and Violence variables computed by counting crime occurrences within a 150-meter radius of a parcel.

Variable Selection. We leave out Thefts since it is highly correlated with Violence but has a lower correlation with Price than Violence does. For similar reasons, we leave out Effective_Area and keep Living_Area. We keep both Bed and Bath for now.

Transformations. From the correlation plots, we can see that Price, Living_Area, and Violence are right-skewed. To adjust for this, we take the log of Price and the square root of Living_Area and Violence. The residual plots look much better for the transformed model.

# Fit linear models with no transformations
m0 <- lm(Price ~ Violence + Living_Area + Year + Bed + Bath + Condition, 
         data = X)
summary(m0)
## 
## Call:
## lm(formula = Price ~ Violence + Living_Area + Year + Bed + Bath + 
##     Condition, data = X)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -75062  -7023    856   6819 110854 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -2.458e+05  1.440e+04 -17.063  < 2e-16 ***
## Violence           -2.521e+02  6.990e+00 -36.065  < 2e-16 ***
## Living_Area         3.349e+01  3.255e-01 102.893  < 2e-16 ***
## Year                1.132e+02  6.656e+00  17.002  < 2e-16 ***
## Bed                -2.306e+03  2.293e+02 -10.061  < 2e-16 ***
## Bath                5.426e+03  3.556e+02  15.258  < 2e-16 ***
## ConditionVery Poor  6.029e+02  8.696e+03   0.069   0.9447    
## ConditionPoor       1.345e+04  6.813e+03   1.974   0.0484 *  
## ConditionFair       2.685e+04  6.408e+03   4.190 2.82e-05 ***
## ConditionFair-Avg   3.280e+04  6.258e+03   5.241 1.65e-07 ***
## ConditionAverage    4.180e+04  6.163e+03   6.784 1.27e-11 ***
## ConditionAvg-Good   4.936e+04  6.158e+03   8.017 1.26e-15 ***
## ConditionGood       5.025e+04  6.157e+03   8.161 3.90e-16 ***
## ConditionGood-VG    5.442e+04  6.176e+03   8.810  < 2e-16 ***
## ConditionVery Good  5.704e+04  6.179e+03   9.231  < 2e-16 ***
## ConditionExcellent  6.455e+04  6.330e+03  10.197  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13740 on 7204 degrees of freedom
## Multiple R-squared:  0.8252, Adjusted R-squared:  0.8248 
## F-statistic:  2267 on 15 and 7204 DF,  p-value: < 2.2e-16
# Plot diagnostics
par(mfrow = c(2, 2))
plot(m0)

# Fit linear models with transformations
m1 <- lm(log(Price) ~ sqrt(Violence) + sqrt(Living_Area) + Year + 
           Bed + Bath + Condition, 
         data = X)
summary(m1)
## 
## Call:
## lm(formula = log(Price) ~ sqrt(Violence) + sqrt(Living_Area) + 
##     Year + Bed + Bath + Condition, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27511 -0.07695  0.01528  0.09805  0.72980 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         6.884e+00  1.771e-01  38.860  < 2e-16 ***
## sqrt(Violence)     -4.028e-02  9.175e-04 -43.908  < 2e-16 ***
## sqrt(Living_Area)   2.823e-02  3.803e-04  74.234  < 2e-16 ***
## Year                9.114e-04  8.078e-05  11.282  < 2e-16 ***
## Bed                -9.644e-03  2.818e-03  -3.422 0.000626 ***
## Bath                4.061e-02  4.161e-03   9.759  < 2e-16 ***
## ConditionVery Poor  7.605e-01  1.045e-01   7.279 3.71e-13 ***
## ConditionPoor       9.360e-01  8.185e-02  11.436  < 2e-16 ***
## ConditionFair       1.068e+00  7.699e-02  13.875  < 2e-16 ***
## ConditionFair-Avg   1.131e+00  7.519e-02  15.043  < 2e-16 ***
## ConditionAverage    1.377e+00  7.404e-02  18.603  < 2e-16 ***
## ConditionAvg-Good   1.500e+00  7.398e-02  20.271  < 2e-16 ***
## ConditionGood       1.517e+00  7.398e-02  20.507  < 2e-16 ***
## ConditionGood-VG    1.552e+00  7.421e-02  20.911  < 2e-16 ***
## ConditionVery Good  1.580e+00  7.424e-02  21.277  < 2e-16 ***
## ConditionExcellent  1.650e+00  7.605e-02  21.702  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1651 on 7204 degrees of freedom
## Multiple R-squared:  0.7622, Adjusted R-squared:  0.7617 
## F-statistic:  1540 on 15 and 7204 DF,  p-value: < 2.2e-16
plot(m1)

There are 31 parcels beyond 4 standard errors below the regression line. This means the model is severely overestimating the value of these parcels. Let’s find out what these parcels are.

  • Geography: There is a cluster of 19 offending parcels near (-72.7, 41.75) in Census Tract 5049. The remaining parcels are scattered throughout Hartford. Interact with the map to see the details of each parcel.
  • Condition: The “large” residuals occur for parcels in better than Average condition, although the majority of them are Average or worse.
  • Numeric covariates: The residuals are highly correlated with the living area and number of bedrooms.

Solutions. We try the following fixes:

  • Log transform living area: Refer to the previous correlation plots and observe that Price and Living Area appear to have a strongly correlated linear correlation. However, we performed a log transformation on the Price variable but a square-root transformation on the Living Area variable. This asymmetry may be the cause of the large residuals.

  • Drop number of bedrooms: The number of bedrooms is highly correlated with the residuals, and we might infer that most of the information provided by this variable is already captured in the number of bathrooms.

  • Drop the 19 parcels: These parcels are in a small geographic area and may be subject to some unobserved spatial effect.

# Investigate the large negative residuals
r <- residuals(m1) / summary(m1)$sigma # standard residuals
o <- X[r < -4, ]
o$StdResid <- r[r < -4]
nrow(o)
## [1] 31
# Add a tooltip label for parcels
o <- o %>% mutate(label = paste0('Price: ', round(Price), '<br>',
                                 'Year: ', Year, '<br>',
                                 'Condition: ', Condition, '<br>',
                                 'Living Area: ', Living_Area, '<br>',
                                 'Bedrooms: ', Bed, '<br>', 
                                 'Bathrooms: ', Bath, '<br>',
                                 'Violence: ', Violence
                                 ))
# Plot the offending parcels geographically
g <- ggplot(o) +
  geom_sf(data = hartford_tracts, aes(text = NAME)) +
  geom_sf(data = o, 
          aes(text = label)) +
  stat_sf_coordinates(size = 1, color = "red") +
  labs(x = "Latitude", y = "Longitude") +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
toWebGL(ggplotly(g))
# Plot residuals vs. Condition
# Function for number of observations 
give.n <- function(x){
  return(data.frame(
    y = quantile(x, .75) + 0.1, 
    label = paste0("n = ", length(x))
    ))
}
ggplot(data = o, aes(x = Condition, y = StdResid)) + 
  geom_boxplot() + 
  stat_summary(fun.data = give.n, geom = "text", fun.y = median) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  theme_bw()

# Plot residuals vs transformed numeric covariates
o <- o %>%
  mutate(logPrice = log(Price), 
         sqrtViolence = sqrt(Violence),
         sqrtLiving_Area = sqrt(Living_Area))
ggpairs(o, columns = c("StdResid", "sqrtViolence", "sqrtLiving_Area", 
                       "Year", "Bed", "Bath"), 
        lower = list(continuous = "points"),
        progress = F) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Let’s test our hypotheses.

  • After taking the log of Living_Area, the residuals are more symmetrically distributed. We may have sacrificed some upper tail normality for the lower tail, since there appear to be more residuals larger than 4 now. However, we have reduced the total number of residuals beyond 4 SE’s from 33 to 21. Notice that the Beds variable is no longer statistically significant.
  • Dropping the number of bedrooms slightly increases adjusted \(R^2\).
  • Dropping the 19 parcels increases the adjusted \(R^2\) to 0.76 and further decreases the number of residuals beyond 4 SE’s to 5. Thus, these parcels represented a significant source of error in the model and might be a good point of further investigation.
# Take log of living area
m2 <- lm(log(Price) ~ sqrt(Violence) + log(Living_Area) + Year + 
           Bed + Bath + Condition, 
         data = X)
summary(m2)
## 
## Call:
## lm(formula = log(Price) ~ sqrt(Violence) + log(Living_Area) + 
##     Year + Bed + Bath + Condition, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.28251 -0.08783  0.01367  0.10248  1.09006 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.174e+00  2.016e-01  20.709  < 2e-16 ***
## sqrt(Violence)     -4.364e-02  9.532e-04 -45.777  < 2e-16 ***
## log(Living_Area)    5.548e-01  8.271e-03  67.070  < 2e-16 ***
## Year                7.687e-04  8.401e-05   9.150  < 2e-16 ***
## Bed                -8.052e-04  2.920e-03  -0.276    0.783    
## Bath                7.505e-02  4.174e-03  17.982  < 2e-16 ***
## ConditionVery Poor  7.686e-01  1.089e-01   7.059 1.84e-12 ***
## ConditionPoor       9.315e-01  8.531e-02  10.919  < 2e-16 ***
## ConditionFair       1.068e+00  8.025e-02  13.305  < 2e-16 ***
## ConditionFair-Avg   1.125e+00  7.838e-02  14.354  < 2e-16 ***
## ConditionAverage    1.365e+00  7.718e-02  17.691  < 2e-16 ***
## ConditionAvg-Good   1.485e+00  7.711e-02  19.252  < 2e-16 ***
## ConditionGood       1.500e+00  7.711e-02  19.450  < 2e-16 ***
## ConditionGood-VG    1.536e+00  7.735e-02  19.863  < 2e-16 ***
## ConditionVery Good  1.567e+00  7.738e-02  20.250  < 2e-16 ***
## ConditionExcellent  1.642e+00  7.927e-02  20.718  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1721 on 7204 degrees of freedom
## Multiple R-squared:  0.7417, Adjusted R-squared:  0.7411 
## F-statistic:  1379 on 15 and 7204 DF,  p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(m2)

# Drop number of bedrooms
m3 <- lm(log(Price) ~ sqrt(Violence) + log(Living_Area) + Year + 
           Bath + Condition, 
         data = X)
summary(m3)
## 
## Call:
## lm(formula = log(Price) ~ sqrt(Violence) + log(Living_Area) + 
##     Year + Bath + Condition, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.28246 -0.08786  0.01381  0.10273  1.08541 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.180e+00  2.005e-01  20.844  < 2e-16 ***
## sqrt(Violence)     -4.366e-02  9.488e-04 -46.017  < 2e-16 ***
## log(Living_Area)    5.535e-01  6.998e-03  79.099  < 2e-16 ***
## Year                7.694e-04  8.396e-05   9.164  < 2e-16 ***
## Bath                7.485e-02  4.105e-03  18.231  < 2e-16 ***
## ConditionVery Poor  7.685e-01  1.089e-01   7.058 1.84e-12 ***
## ConditionPoor       9.311e-01  8.530e-02  10.916  < 2e-16 ***
## ConditionFair       1.067e+00  8.023e-02  13.303  < 2e-16 ***
## ConditionFair-Avg   1.125e+00  7.836e-02  14.352  < 2e-16 ***
## ConditionAverage    1.365e+00  7.716e-02  17.690  < 2e-16 ***
## ConditionAvg-Good   1.484e+00  7.710e-02  19.252  < 2e-16 ***
## ConditionGood       1.499e+00  7.710e-02  19.449  < 2e-16 ***
## ConditionGood-VG    1.536e+00  7.733e-02  19.863  < 2e-16 ***
## ConditionVery Good  1.567e+00  7.737e-02  20.249  < 2e-16 ***
## ConditionExcellent  1.642e+00  7.926e-02  20.718  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.172 on 7205 degrees of freedom
## Multiple R-squared:  0.7416, Adjusted R-squared:  0.7411 
## F-statistic:  1477 on 14 and 7205 DF,  p-value: < 2.2e-16
plot(m3)

# Drop the 19 outlier parcels
XO <- X[!X$OBJECTID %in% o$OBJECTID, ]
m4 <- lm(log(Price) ~ sqrt(Violence) + log(Living_Area) + Year + 
           Bath + Condition, 
         data = XO)
summary(m4)
## 
## Call:
## lm(formula = log(Price) ~ sqrt(Violence) + log(Living_Area) + 
##     Year + Bath + Condition, data = XO)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63568 -0.08865  0.01138  0.09956  1.04299 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.468e+00  1.920e-01  23.274  < 2e-16 ***
## sqrt(Violence)     -4.375e-02  9.059e-04 -48.297  < 2e-16 ***
## log(Living_Area)    5.430e-01  6.697e-03  81.076  < 2e-16 ***
## Year                6.595e-04  8.042e-05   8.202 2.78e-16 ***
## Bath                7.561e-02  3.916e-03  19.311  < 2e-16 ***
## ConditionVery Poor  7.680e-01  1.037e-01   7.403 1.49e-13 ***
## ConditionPoor       9.317e-01  8.127e-02  11.464  < 2e-16 ***
## ConditionFair       1.104e+00  7.661e-02  14.406  < 2e-16 ***
## ConditionFair-Avg   1.144e+00  7.470e-02  15.314  < 2e-16 ***
## ConditionAverage    1.375e+00  7.352e-02  18.702  < 2e-16 ***
## ConditionAvg-Good   1.485e+00  7.346e-02  20.215  < 2e-16 ***
## ConditionGood       1.501e+00  7.346e-02  20.434  < 2e-16 ***
## ConditionGood-VG    1.542e+00  7.368e-02  20.931  < 2e-16 ***
## ConditionVery Good  1.572e+00  7.372e-02  21.318  < 2e-16 ***
## ConditionExcellent  1.669e+00  7.557e-02  22.082  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1639 on 7174 degrees of freedom
## Multiple R-squared:  0.755,  Adjusted R-squared:  0.7545 
## F-statistic:  1579 on 14 and 7174 DF,  p-value: < 2.2e-16
plot(m4)

# Compare residuals beyond 4 SE's
r1 <- residuals(m1) / summary(m1)$sigma
sum(r1 < -4 | r1 > 4)
## [1] 33
r3 <- residuals(m3) / summary(m3)$sigma
sum(r3 < -4 | r3 > 4)
## [1] 22
r4 <- residuals(m4) / summary(m4)$sigma
sum(r4 < -4 | r4 > 4)
## [1] 5

2. Crime KDE

Replacing the Violence computed by counts with Violence_KDE computed by kernel density estimation increases the adjusted \(R^2\) and reduces residual standard error. From the plots, there is not a substantial change in the residuals. Thus, we can conclude that the KDE of violent crimes is a better predictor of property value than the count of violent crimes. Our final model is:

\[ \log \text{Price} = \beta_0 + \beta_1 \sqrt{\text{Violence_KDE}} + \beta_2 \log(\text{Living_Area}) + \beta_3 \text{Year} + \beta_4 \text{Bath} + \beta_5 \text{Condition} \]

m5 <- lm(log(Price) ~ sqrt(Violence_KDE) + log(Living_Area) + Year + 
           Bath + Condition, 
         data = XO)
summary(m5)
## 
## Call:
## lm(formula = log(Price) ~ sqrt(Violence_KDE) + log(Living_Area) + 
##     Year + Bath + Condition, data = XO)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63300 -0.09047  0.01031  0.09823  1.05442 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.645e+00  1.892e-01  24.557  < 2e-16 ***
## sqrt(Violence_KDE) -1.736e+03  3.370e+01 -51.514  < 2e-16 ***
## log(Living_Area)    5.434e-01  6.587e-03  82.505  < 2e-16 ***
## Year                5.973e-04  7.919e-05   7.543 5.15e-14 ***
## Bath                7.241e-02  3.856e-03  18.778  < 2e-16 ***
## ConditionVery Poor  7.661e-01  1.020e-01   7.508 6.71e-14 ***
## ConditionPoor       9.043e-01  7.995e-02  11.312  < 2e-16 ***
## ConditionFair       1.076e+00  7.536e-02  14.282  < 2e-16 ***
## ConditionFair-Avg   1.118e+00  7.349e-02  15.216  < 2e-16 ***
## ConditionAverage    1.352e+00  7.232e-02  18.693  < 2e-16 ***
## ConditionAvg-Good   1.459e+00  7.226e-02  20.191  < 2e-16 ***
## ConditionGood       1.474e+00  7.226e-02  20.404  < 2e-16 ***
## ConditionGood-VG    1.518e+00  7.248e-02  20.941  < 2e-16 ***
## ConditionVery Good  1.548e+00  7.252e-02  21.346  < 2e-16 ***
## ConditionExcellent  1.650e+00  7.434e-02  22.192  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1612 on 7174 degrees of freedom
## Multiple R-squared:  0.763,  Adjusted R-squared:  0.7625 
## F-statistic:  1649 on 14 and 7174 DF,  p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(m5)

References

Spatial regression

https://oerstatistics.wordpress.com/wp-content/uploads/2016/03/intro_to_r.pdf#page=68.08

https://crd230.github.io/lab8.html

Kernel density estimation

https://seeing-statistics.com/issue4/